PESA: Phrase Pair Extraction as Sentence Splitting
نویسنده
چکیده
Most statistical machine translation systems use phrase-to-phrase translations to capture local context information, leading to better lexical choice and more reliable local reordering. The quality of the phrase alignment is crucial to the quality of the resulting translations. Here, we propose a new phrase alignment method, not based on the Viterbi path of word alignment models. Phrase alignment is viewed as a sentence splitting task. For a given spitting of the source sentence (source phrase, left segment, right segment) find a splitting for the target sentence, which optimizes the overall sentence alignment probability. Experiments on different translation tasks show that this phrase alignment method leads to highly competitive translation results.
منابع مشابه
Shallow Language Processing Architecture for Bulgarian
This paper describes LINGUA an architecture for text processing in Bulgarian. First, the pre-processing modules for tokenisation, sentence splitting, paragraph segmentation, partof-speech tagging, clause chunking and noun phrase extraction are outlined. Next, the paper proceeds to describe in more detail the anaphora resolution module. Evaluation results are reported for each processing task.
متن کاملA Simple Sentence-Level Extraction Algorithm for Comparable Data
The paper presents a novel sentence pair extraction algorithm for comparable data, where a large set of candidate sentence pairs is scored directly at the sentence-level. The sentencelevel extraction relies on a very efficient implementation of a simple symmetric scoring function: a computation speed-up by a factor of 30 is reported. On Spanish-English data, the extraction algorithm finds the h...
متن کاملDiscriminative Modeling of Extraction Sets for Machine Translation
We present a discriminative model that directly predicts which set of phrasal translation rules should be extracted from a sentence pair. Our model scores extraction sets: nested collections of all the overlapping phrase pairs consistent with an underlying word alignment. Extraction set models provide two principle advantages over word-factored alignment models. First, we can incorporate featur...
متن کاملSyntax Augmented Machine Translation via Chart Parsing with Integrated Language Modeling
We present a hierarchical phrase-based translation model which annotates and generalizes existing phrase translations with syntactic categories derived from parsing the target side of a parallel corpus. We associate target parse trees for each training sentence pair with a search lattice constructed from the existing phrase translations on the corresponding source sentence, and consider techniq...
متن کاملSyntax Augmented Machine Translation via Chart Parsing with Integrated Language Modeling
We present a hierarchical phrase-based translation model which annotates and generalizes existing phrase translations with syntactic categories derived from parsing the target side of a parallel corpus. We associate target parse trees for each training sentence pair with a search lattice constructed from the existing phrase translations on the corresponding source sentence, and consider techniq...
متن کامل